152 research outputs found

    Modelling the evolution of the archeal tryptophan synthase

    Get PDF
    BACKGROUND: Microorganisms and plants are able to produce tryptophan. Enzymes catalysing the last seven steps of tryptophan biosynthesis are encoded in the canonical trp operon. Among the trp genes are most frequently trpA and trpB, which code for the alpha and beta subunit of tryptophan synthase. In several prokaryotic genomes, two variants of trpB (named trpB1 or trpB2) occur in different combinations. The evolutionary history of these trpB genes is under debate. RESULTS: In order to study the evolution of trp genes, completely sequenced archeal and bacterial genomes containing trpB were analysed. Phylogenetic trees indicated that TrpB sequences constitute four distinct groups; their composition is in agreement with the location of respective genes. The first group consisted exclusively of trpB1 genes most of which belonged to trp operons. Groups two to four contained trpB2 genes. The largest group (trpB2_o) contained trpB2 genes all located outside of operons. Most of these genes originated from species possessing an operon-based trpB1 in addition. Groups three and four pertain to trpB2 genes of those genomes containing exclusively one or two trpB2 genes, but no trpB1. One group (trpB2_i) consisted of trpB2 genes located inside, the other (trpB2_a) of trpB2 genes located outside the trp operon. TrpA and TrpB form a heterodimer and cooperate biochemically. In order to characterise trpB variants and stages of TrpA/TrpB cooperation in silico, several approaches were combined. Phylogenetic trees were constructed for all trp genes; their structure was assessed via bootstrapping. Alternative models of trpB evolution were evaluated with parsimony arguments. The four groups of trpB variants were correlated with archeal speciation. Several stages of TrpA/TrpB cooperation were identified and trpB variants were characterised. Most plausibly, trpB2 represents the predecessor of the modern trpB gene, and trpB1 evolved in an ancestral bacterium. CONCLUSION: In archeal genomes, several stages of trpB evolution, TrpA/TrpB cooperation, and operon formation can be observed. Thus, archeal trp genes may serve as a model system for studying the evolution of protein-protein interactions and operon formation

    SIGI: score-based identification of genomic islands

    Get PDF
    BACKGROUND: Genomic islands can be observed in many microbial genomes. These stretches of DNA have a conspicuous composition with regard to sequence or encoded functions. Genomic islands are assumed to be frequently acquired via horizontal gene transfer. For the analysis of genome structure and the study of horizontal gene transfer, it is necessary to reliably identify and characterize these islands. RESULTS: A scoring scheme on codon frequencies Score_G1G2(cdn) = log(f_G2(cdn) / f_G1(cdn)) was utilized. To analyse genes of a species G1 and to test their relatedness to species G2, scores were determined by applying the formula to log-odds derived from mean codon frequencies of the two genomes. A non-redundant set of nearly 400 codon usage tables comprising microbial species was derived; its members were used alternatively at position G2. Genes having at least one score value above a species-specific and dynamically determined cut-off value were analysed further. By means of cluster analysis, genes were identified that comprise clusters of statistically significant size. These clusters were predicted as genomic islands. Finally and individually for each of these genes, the taxonomical relation among those species responsible for significant scores was interpreted. The validity of the approach and its limitations were made plausible by an extensive analysis of natural genes and synthetic ones aimed at modelling the process of gene amelioration. CONCLUSIONS: The method reliably allows to identify genomic island and the likely origin of alien genes

    Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites

    Get PDF
    BACKGROUND: Kernel-based learning algorithms are among the most advanced machine learning methods and have been successfully applied to a variety of sequence classification tasks within the field of bioinformatics. Conventional kernels utilized so far do not provide an easy interpretation of the learnt representations in terms of positional and compositional variability of the underlying biological signals. RESULTS: We propose a kernel-based approach to datamining on biological sequences. With our method it is possible to model and analyze positional variability of oligomers of any length in a natural way. On one hand this is achieved by mapping the sequences to an intuitive but high-dimensional feature space, well-suited for interpretation of the learnt models. On the other hand, by means of the kernel trick we can provide a general learning algorithm for that high-dimensional representation because all required statistics can be computed without performing an explicit feature space mapping of the sequences. By introducing a kernel parameter that controls the degree of position-dependency, our feature space representation can be tailored to the characteristics of the biological problem at hand. A regularized learning scheme enables application even to biological problems for which only small sets of example sequences are available. Our approach includes a visualization method for transparent representation of characteristic sequence features. Thereby importance of features can be measured in terms of discriminative strength with respect to classification of the underlying sequences. To demonstrate and validate our concept on a biochemically well-defined case, we analyze E. coli translation initiation sites in order to show that we can find biologically relevant signals. For that case, our results clearly show that the Shine-Dalgarno sequence is the most important signal upstream a start codon. The variability in position and composition we found for that signal is in accordance with previous biological knowledge. We also find evidence for signals downstream of the start codon, previously introduced as transcriptional enhancers. These signals are mainly characterized by occurrences of adenine in a region of about 4 nucleotides next to the start codon. CONCLUSIONS: We showed that the oligo kernel can provide a valuable tool for the analysis of relevant signals in biological sequences. In the case of translation initiation sites we could clearly deduce the most discriminative motifs and their positional variation from example sequences. Attractive features of our approach are its flexibility with respect to oligomer length and position conservation. By means of these two parameters oligo kernels can easily be adapted to different biological problems

    Conserved genomic neighborhood is a strong but no perfect indicator for a direct interaction of microbial gene products

    Get PDF
    Background The order of genes in bacterial genomes is not random; for example, the products of genes belonging to an operon work together in the same pathway. The cotranslational assembly of protein complexes is deemed to conserve genomic neighborhoods even stronger than a common function. This is why a conserved genomic neighborhood can be utilized to predict, whether gene products form protein complexes. Results We were interested to assess the performance of a neighborhood-based classifier that analyzes a large number of genomes. Thus, we determined for the genes encoding the subunits of 494 experimentally verified hetero-dimers their local genomic context. In order to generate phylogenetically comprehensive genomic neighborhoods, we utilized the tools offered by the Enzyme Function Initiative. For each subunit, a sequence similarity network was generated and the corresponding genome neighborhood network was analyzed to deduce the most frequent gene product. This was predicted as interaction partner, if its abundance exceeded a threshold, which was the frequency giving rise to the maximal Matthews correlation coefficient. For the threshold of 16%, the true positive rate was 45%, the false positive rate 0.06%, and the precision 55%. For approximately 20% of the subunits, the interaction partner was not found in a neighborhood of +/- 10 genes. Conclusions Our phylogenetically comprehensive analysis confirmed that complex formation is a strong evolutionary factor that conserves genome neighborhoods. On the other hand, for 55% of the cases analyzed here, classification failed. Either, the interaction partner was not present in a +/- 10 gene window or was not the most frequent gene product

    TransCent: Computational enzyme design by transferring active sites and considering constraints relevant for catalysis

    Get PDF
    BACKGROUND: Computational enzyme design is far from being applicable for the general case. Due to computational complexity and limited knowledge of the structure-function interplay, heuristic methods have to be used. RESULTS: We have developed TransCent, a computational enzyme design method supporting the transfer of active sites from one enzyme to an alternative scaffold. In an optimization process, it balances requirements originating from four constraints. These are 1) protein stability, 2) ligand binding, 3) pKa values of active site residues, and 4) structural features of the active site. Each constraint is handled by an individual software module. Modules processing the first three constraints are based on state-of-the-art concepts, i.e. RosettaDesign, DrugScore, and PROPKA. To account for the fourth constraint, knowledge-based potentials are utilized. The contribution of modules to the performance of TransCent was evaluated by means of a recapitulation test. The redesign of oxidoreductase cytochrome P450 was analyzed in detail. As a first application, we present and discuss models for the transfer of active sites in enzymes sharing the frequently encountered triosephosphate isomerase fold. CONCLUSION: A recapitulation test on native enzymes showed that TransCent proposes active sites that resemble the native enzyme more than those generated by RosettaDesign alone. Additional tests demonstrated that each module contributes to the overall performance in a statistically significant manner

    AGeNNT: annotation of enzyme families by means of refined neighborhood networks

    Get PDF
    Background: Large enzyme families may contain functionally diverse members that give rise to clusters in a sequence similarity network (SSN). In prokaryotes, the genome neighborhood of a gene-product is indicative of its function and thus, a genome neighborhood network (GNN) deduced for an SSN provides strong clues to the specific function of enzymes constituting the different clusters. The Enzyme Function Initiative (http://enzymefunction.org/) offers services that compute SSNs and GNNs. Results: We have implemented AGeNNT that utilizes these services, albeit with datasets purged with respect to unspecific protein functions and overrepresented species. AGeNNT generates refined GNNs (rGNNs) that consist of cluster-nodes representing the sequences under study and Pfam-nodes representing enzyme functions encoded in the respective neighborhoods. For cluster-nodes, AGeNNT summarizes the phylogenetic relationships of the contributing species and a statistic indicates how unique nodes and GNs are within this rGNN. Pfam-nodes are annotated with additional features like GO terms describing protein function. For edges, the coverage is given, which is the relative number of neighborhoods containing the considered enzyme function (Pfam-node). AGeNNT is available at https://github.com/kandlinf/agennt. Conclusions: An rGNN is easier to interpret than a conventional GNN, which commonly contains proteins without enzymatic function and overly specific neighborhoods due to phylogenetic bias. The implemented filter routines and the statistic allow the user to identify those neighborhoods that are most indicative of a specific metabolic capacity. Thus, AGeNNT facilitates to distinguish and annotate functionally different members of enzyme families

    Assessing in silico the recruitment and functional spectrum of bacterial enzymes from secondary metabolism

    Get PDF
    Background: Microbes, plants, and fungi synthesize an enormous number of metabolites exhibiting rich chemical diversity. For a high-level classification, metabolism is subdivided into primary (PM) and secondary (SM) metabolism. SM products are often not essential for survival of the organism and it is generally assumed that SM enzymes stem from PM homologs. Results: We wanted to assess evolutionary relationships and function of bona fide bacterial PM and SM enzymes. Thus, we analyzed the content of 1010 biosynthetic gene clusters (BGCs) from the MIBiG dataset; the encoded bacterial enzymes served as representatives of SM. The content of 15 bacterial genomes known not to harbor BGCs served as a representation of PM. Enzymes were categorized on their EC number and for these enzyme functions, frequencies were determined. The comparison of PM/SM frequencies indicates a certain preference for hydrolases (EC class 3) and ligases (EC class 6) in PM and of oxidoreductases (EC class 1) and lyases (EC class 4) in SM. Based on BLAST searches, we determined pairs of PM/SM homologs and their functional diversity. Oxidoreductases, transferases (EC class 2), lyases and isomerases (EC class 5) form a tightly interlinked network indicating that many protein folds can accommodate different functions in PM and SM. In contrast, the functional diversity of hydrolases and especially ligases is significantly limited in PM and SM. For the most direct comparison of PM/SM homologs, we restricted for each BGC the search to the content of the genome it comes from. For each homologous hit, the contribution of the genomic neighborhood to metabolic pathways was summarized in BGC-specific html-pages that are interlinked with KEGG; this dataset can be downloaded from https://www.bioinf.ur.de. Conclusions: Only few reaction chemistries are overrepresented in bacterial SM and at least 55% of the enzymatic functions present in BGCs possess PM homologs. Many SM enzymes arose in PM and Nature utilized the evolvability of enzymes similarly to establish novel functions both in PM and SM. Future work aimed at the elucidation of evolutionary routes that have interconverted a PM enzyme into an SM homolog can profit from our BGC-specific annotations

    Experimental and computational analysis of the ancestry of an evolutionary young enzyme from histidine biosynthesis

    Get PDF
    The conservation of fold and chemistry of the enzymes associated with histidine biosynthesis suggests that this pathway evolved prior to the diversification of Bacteria, Archaea, and Eukaryotes. The only exception is the histidinol phosphate phosphatase (HolPase). So far, non-homologous HolPases that possess distinct folds and belong to three different protein superfamilies have been identified in various phylogenetic clades. However, their evolution has remained unknown to date. Here, we analyzed the evolutionary history of the HolPase from γ-Proteobacteria (HisB-N). It has been argued that HisB-N and its closest homologue d-glycero-d-manno-heptose-1,7-bisphosphate 7-phosphatase (GmhB) have emerged from the same promiscuous ancestral phosphatase. GmhB variants catalyze the hydrolysis of the anomeric d-glycero-d-manno-heptose-1,7-bisphosphate (αHBP or βHBP) with a strong preference for one anomer (αGmhB or βGmhB). We found that HisB-N from Escherichia coli shows promiscuous activity for βHBP but not αHBP, while βGmhB from Crassaminicella sp. shows promiscuous activity for HolP. Accordingly, a combined phylogenetic tree of αGmhBs, βGmhBs, and HisB-N sequences revealed that HisB-Ns form a compact subcluster derived from βGmhBs. Ancestral sequence reconstruction and in vitro analysis revealed a promiscuous HolPase activity in the resurrected enzymes prior to functional divergence of the successors. The following increase in catalytic efficiency of the HolP turnover is reflected in the shape and electrostatics of the active site predicted by AlphaFold. An analysis of the phylogenetic tree led to a revised evolutionary model that proposes the horizontal gene transfer of a promiscuous βGmhB from δ- to γ-Proteobacteria where it evolved to the modern HisB-N

    Rosetta:MSF: a modular framework for multi-state computational protein design

    Get PDF
    Computational protein design (CPD) is a powerful technique to engineer existing proteins or to design novel ones that display desired properties. Rosetta is a software suite including algorithms for computational modeling and analysis of protein structures and offers many elaborate protocols created to solve highly specific tasks of protein engineering. Most of Rosetta's protocols optimize sequences based on a single conformation (i.e. design state). However, challenging CPD objectives like multi-specificity design or the concurrent consideration of positive and negative design goals demand the simultaneous assessment of multiple states. This is why we have developed the multi-state framework MSF that facilitates the implementation of Rosetta's single-state protocols in a multi-state environment and made available two frequently used protocols. Utilizing MSF, we demonstrated for one of these protocols that multi-state design yields a 15% higher performance than single-state design on a ligand-binding benchmark consisting of structural conformations. With this protocol, we designed de novo nine retro-aldolases on a conformational ensemble deduced from a (beta alpha)(8)-barrel protein. All variants displayed measurable catalytic activity, testifying to a high success rate for this concept of multi-state enzyme design

    Evidence for the Existence of Elaborate Enzyme Complexes in the Paleoarchean Era

    Get PDF
    International audience: Due to the lack of macromolecular fossils, the enzymatic repertoire of extinct species has remained largely unknown to date. In an attempt to solve this problem, we have characterized a cyclase subunit (HisF) of the imidazole glycerol phosphate synthase (ImGP-S), which was reconstructed from the era of the last universal common ancestor of cellular organisms (LUCA). As observed for contemporary HisF proteins, the crystal structure of LUCA-HisF adopts the (βα)8-barrel architecture, one of the most ancient folds. Moreover, LUCA-HisF (i) resembles extant HisF proteins with regard to internal 2-fold symmetry, active site residues, and a stabilizing salt bridge cluster, (ii) is thermostable and shows a folding mechanism similar to that of contemporary (βα)8-barrel enzymes, (iii) displays high catalytic activity, and (iv) forms a stable and functional complex with the glutaminase subunit (HisH) of an extant ImGP-S. Furthermore, we show that LUCA-HisF binds to a reconstructed LUCA-HisH protein with high affinity. Our findings suggest that the evolution of highly efficient enzymes and enzyme complexes has already been completed in the LUCA era, which means that sophisticated catalytic concepts such as substrate channeling and allosteric communication existed already 3.5 billion years ago
    corecore